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Abstract This paper deals with faces and facets of the family-variable poly¬ 
tope and the characteristic-imset polytope , which are special polytopes used in 
integer linear programming approaches to statistically learn Bayesian network 
structure. A common form of linear objectives to be maximized in this area 
leads to the concept of score equivalence (SE), both for linear objectives and 
for faces of the family-variable polytope. 

We characterize the linear space of SE objectives and establish a one-to-one 
correspondence between SE faces of the family-variable polytope, the faces of 
the characteristic-imset polytope, and standardized supermodular functions. 
The characterization of SE facets in terms of extremality of the corresponding 
supermodular function gives an elegant method to verify whether an inequality 
is SE-facet-defining for the family-variable polytope. 

We also show that when maximizing an SE objective one can eliminate 
linear constraints of the family-variable polytope that correspond to non-SE 
facets. However, we show that solely considering SE facets is not enough as a 
counter-example shows; one has to consider the linear inequality constraints 
that correspond to facets of the characteristic-imset polytope despite the fact 
that they may not define facets in the family-variable mode. 
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1 Introduction 

The motivation for our paper is statistically learning Bayesian network (BN) 
structure. Bayesian networks are popular models used in statistics [13] and 
probabilistic reasoning [17]. Acyclic directed graphs, whose nodes correspond 
to random variables in consideration, are used to describe the probabilistic 
conditional independence structures behind the statistical models [18]. 

Specifically, our motivation comes from the integer linear programming 
(ILP) approach to the statistical learning task to determine the structural 
model on basis of observed data. Nowadays, the most popular is the score- 
based approach consisting in maximizing a scoring criterion G *->• Q(G,D), 
where G is an acyclic directed graph, D the observed database and the value 
Q(G, D) says how much the BN structure defined by the graph G explains the 
occurrence the database D [14]. 

The point of the ILP approach is that the criteria used in practice can 
be viewed as (the restriction of) affine functions of suitable vector representa¬ 
tives of BN structures, typically of acyclic directed graphs. The most common 
is the family-variable vector representation of the graphs suggested indepen¬ 
dently in [12] and [7]. Very good running times have recently been achieved 
using this vector representation and the branch-and-cut approach [1,9]. The 
corresponding family-variable polytope , defined as the convex hull of these vec¬ 
tor representatives, is one of the topics of interest in this paper. 

Another ILP approach based on characteristic-imset vector representation 
of BN structures was suggested in [11]; its motivational sources date back to 
[18]. Unlike the family-variable vectors, the characteristic imsets uniquely cor¬ 
respond to BN structures. This ILP approach is also feasible [22], but has not 
resulted in better running times than those achieved using GOBNILP software 
[9]. The other polytope we are interested in this paper is the characteristic- 
imset polytope , defined as the convex hull of all characteristic imsets. 

Our paper is devoted to the comparison of the facet-defining inequalities 
for the two above-mentioned polytopes, because such inequalities appear to 
be the most useful ones in the cutting plane approach to solving ILP problems 
[24]. There were some former results on this comparison topic in [21], but the 
present paper brings further and deeper findings. 

The structure of the paper is as follows. In § 2 we introduce our notation 
and recall basic concepts; elementary facts on polytopes we need later are 
gathered in § 3. Some fundamental observations on facets of the family-variable 
polytope, on which our later considerations are based, are in § 4; some of these 
facts are also shown using different arguments in a parallel paper [10]. 

In §5 we pinpoint the concept of score equivalence (SE), both for linear 
objectives to be maximized and for faces of the family-variable polytope. We 
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characterize the linear space of SE objectives in § 6. Later, in § 7, we establish 
a one-to-one correspondence between SE faces of the family-variable polytope 
and standardized supermodular set functions. The most beneficial seems to be 
the characterization of SE facets as those that correspond to extreme super- 
modular functions. 

Section § 8 deals with well-known (generalized) cluster inequalities applied 
dominantly in contemporary ILP approaches to BN structure learning. We find 
the corresponding supermodular functions and show they are extreme. This 
gives a simple proof that the generalized cluster inequalities are facet-defining 
for the family-variable polytope; note that another proof of this fact, based 
on different arguments, will appear in [10]. We also interpret the generalized 
cluster inequalities in terms of connected uniform matroids. 

Another one-to-one correspondence between SE faces of the family-variable 
polytope and faces of the characteristic-imset polytope is established in § 9; to 
illustrate this correspondence we derive the form of cluster inequalities in the 
characteristic-imset mode. A few simple examples are given in § 10. 

Further important observation of ours are in § 11: when maximizing an SE 
objective, one actually need not apply the linear facet-defining constraints on 
the family-variable polytope that are not SE. On the other hand, considering 
only SE facets is not enough as a later counter-example in § 12 shows. Thus, 
we also reveal the hidden importance of the linear constraints that correspond 
to facets of the characteristic-imset polytope in § 11 . 

The appendix contains the proof of an auxiliary combinatorial identity 
(§A), the catalogue of SE facets in case of four BN variables (§ B) and the 
catalogue of remaining facets of the characteristic-imset polytope in case of 
four BN variables (§C). 


2 Notation and basic concepts 

Let N be a finite non-empty set of BN variables; n := |7V < oo, consider the 
non-trivial case 2 < n. Let DAGS (N) denote the collection of acyclic directed 
graphs over N, that is, such graphs having N as the set of nodes. An example of 
such a graph is the empty graph , which is a graph over N without adjacencies. 
By a full graph we will mean any acyclic directed graph over N in which every 
pair of distinct nodes is adjacent. Given G € DAGS (N) and a £ N, the symbol 
pa G (a) := {b £ N : a b in G} will denote the parent set of the node a. 
A well-known equivalent definition of acyclicity of a directed graph G over N 
is the existence of a total order a \,..., a n of nodes in N such that, for every 
i = 1,..., n, pa G (ai) C {ai,..., a^_i}; we say then that the order and the 
graph are consonant. An immorality in G is an induced subgraph of G of the 
form a — > c £- b, where the nodes a and b are not adjacent in G. 

The symbol G ~ H for G 7 H £ DAGS (N) will mean that the graphs G 
and H are Markov equivalent , that is, in graphical terms, they have the same 
adjacencies and immoralities; for references see [13, p. 60] or [18, p. 48-49]. An 
example of a Markov equivalence class is the set of full graphs over N. 
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A node a together with its parent set B will be called a family. Note that 
any directed graph is determined by its N families. Throughout the paper, the 
index set of family-variable vectors will be 

T:={(a\B) : a£N & 0 ^ B C TV \ {a} } . 


Note that families with empty parent sets are not included. 

Given b £ N and Z C N \ {b} the symbol h^z will be used to denote the 
identifier of this pair, that is, an element of M r given by 

h^z (a | B) = { J f t “ e 7 w & is a e nd B = Z ' for any (a \ B) £ T. 

In case Z = 0, for any b £ TV, is the zero vector. The symbol p G 

will be used to denote the family-variable vector encoding G £ DAGS (TV): 


Vc(a\B) 


1 if B = pa G (a), 
0 otherwise, 


for (a | B) £ T. 


The family-variable polytope can be defined as the convex hull of the set of all 
possible DAG-codes over TV : 

F := conv ({ t? g € M r : G £ DAGS (TV) }). 


Clearly, the dimension of F, defined as the dimension of its linear hull, is 
dim(F) = |T| = n- ( 2 n ~ 1 — 1). It is easy to see that none of the DAG-codes is a 
non-trivial convex combination of the others. In particular, the set of vertices 
(= extreme points) of F is just the set of DAG-codes. 

Given two vectors v, w £ R r , where r is a non-empty finite index set, say 
r — T, their scalar product will be denoted by ( v,w)r , or just by (v,w) if 
there is no danger of confusion. We also consider alternative index sets. 

Specifically, the characteristic imset of G £ DAGS (IV), introduced in [11] 
and denoted below by c G , is an element of R 71 with 


d:={SCiV : |Sj >2}. 

Recall from [21, §3.3.2] and [1, §2] that c G is a many-to-one linear function 
of r] G ; the transformation is 771 —Cr,, where 

ms) = E E rj(a | B ) for any SCAT, |Sj > 2. (1) 

aes B ■. S\{a}CBCN\{a} 

A further fundamental observation is that G ~ H for G, H £ DAGS (TV) iff 
c G = ch ; see [11, §3] for more detailed justification. The characteristic-imset 
polytope is defined as follows: 

C := conv ({ c G £ R 71 : G £ DAGS (TV) }). 

One can show that dim(C) = |A| = 2" — n — 1. Of course, C is the image of F 
by the linear map (1). 
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Moreover, the power set V(N) := {A : A C N} will serve as an index set 
for vectors, used as auxiliary tools in a later proof in § 9. Given A C N, let us 
denote its indicator vector by 



if S = A, 
if SCN, A, 


and define the standard imset for G £ DAGS ( N ) as an element of 


UG := Sn — <$0 + ^2 { <5pa G (a) — ^{a}Upa G (a) } • 
a£N 


( 2 ) 


Recall from [21, § 3.3] that c g is a one-to-one affine function of ug, specifically 


c g(T) = 1- J2 UGfor T Q N ’ l T l ^ 2 ' ( 3 ) 

S-.TCSCN 


In particular, the combination of a former characterization [19, Theorem 4] 
of the vertices of the standard-imset polytope with (3) implies that the set 
of vertices (= extreme points) of the characteristic-imset polytope C is just 
the set of characteristic imsets c g for G £ DAGS ( N ). In other words, no 
characteristic imset is a non-trivial convex combination of the others. 


3 Elementary facts on facets and some conventions 

Recall the basic concept of a face/facet of a polytope. 

Definition 1 (dimension, face, facet) 

Let P be a polytope in R r , where f ^ 0 is finite. Its dimension is defined as 
the dimension of its affine hull, which is a translate of a linear subspace of K r . 
A set FC P is called a face of P if there exists a vector o £ R r and a constant 
u £ K such that 

— PC{»£ K r : (o, v) < u }, and 

— F = {t) £ P : (o,v) = u} . 

We say then that the face F is defined by the inequality (o, v) < u. Every face 
of a polytope is a (possibly empty) polytope, as well; thus, its dimension is 
defined. A facet of P is a face of dimension dim(P) — 1. 

The function v £ R r e-> (o, v) , where o £ R r , is typically a linear objective 
to be maximized by a linear program; with a small abuse of terminology we 
will call o £ R r an objective. 

Note that the dimension of a face is one less than the maximum number of 
affinely independent vectors in the face. An alternative equivalent definition 
of a facet is that it is a sub-maximal face with respect to inclusion. 

Lemma 1 Given a polytope P in R r , 0 < |rj < oo, a face F C P is a facet 
of P iff the only face F 1 of P with F C F' is F' = P itself. 
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Proof The sufficiency follows from the fact that, for every pair of faces iq C F2 
of P with dim(.Fi) < d < dim(i< 2 ), a face F 3 of P exists with Fi C F 3 C F 2 
and dim(T 3 ) = d; see, for example, [4, Corollary 9.7]. For the necessity realize 
that, if Fi C F 2 are faces of P then dim(F’i) < dim(F 2 ); see [4, Corollary 5.5]. 

The consequence is an auxiliary observation, applied later in the paper. 

Corollary 1 Let P C R r , 0 < |F| < 00 be a polytope and let (o\,v) < u\ 
and ( 02 , v) <u 2 be valid inequalities for v £ P such that 

3uq £ P : (oi,Wi) < u\ & (o 2 ,wi) =u 2 and 3 ui 2 £ P : ( o 2 ,w 2 } < u 2 . (4) 

Then no combination of these inequalities (a • 01 + /3 ■ o 2 , v) < a ■ u\ + ft • u 2 
with a,/3 > 0 is a facet-defining inequality for P. 

Proof Let iq, F 2 and F be the faces of P defined by inequalities (cq ,v) < Ui , 
( 02 , v) < u 2 and their combination (a-Oi+/3-o 2 ,v) < a-Ui+(3-u 2 , respectively. 
Given v £ F one has 

a ■ {(oi,u) - ui} + /3 • {(o 2 , v) - u 2 } = 0, 
y ^ ^ y > 

<0 <0 

which implies that the expressions in braces must vanish. In other words, 
F C i 7 ! fl F 2 . Assume for a contradiction that F is a facet. By Lemma 1 
observe that either Fi = P or Iq = F\ the same for F 2 . Since (4) implies 
w\ £ P \ Fi and w 2 G P \ F 2 , one necessarily has F\ = F = F 2 . However, this 
contradicts the existence of w\ £ F 2 \ F\ assumed in (4). 

In this paper we mainly deal with the family-variable polytope F. Every 
face of F can be identified with a set of acyclic directed graphs. Specifically: 

F C Fa face of F — > S = {G £ DAGS (TV) : tjg £ F} . 

This correspondence preserves inclusion, that is, F\ C F 2 for faces of F iff 
Si C S2 for the corresponding sets of graphs Si C DAGS ( N ). The identification 
is possible owing to a basic fact from the theory of polytopes that every face 
F of a polytope P is the convex hull of the set of vertices of P which belong 
to F , see [2, Lemma VI.1.1] or [25, Proposition 2.3(i)]. Since the vertices of F 
are just the DAG-codes 77c, where G £ DAGS ( N ), every face F of F can be 
identified with a subset of DAGS ( N ). This leads to the following convention. 

Definition 2 (a set of graphs interpreted as a face) 

We will call a set S C DAGS ( N ) a face (of the family-variable polytope F) if 
conv ({ tig £ R r : G £ S}) is a face of F. Analogously, S C DAGS ( N ) will be 
called a facet (of F) if conv ({ rjc £ R r : G £ S}) is a facet of F. 

A direct method to show that a face S C DAGS ( N ) is a facet is to show 
that the respective geometric face F has the dimension dim(F) — 1, which 
means, to find dim(F) affinely independent vectors in F. Since the vertices of 
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F are just the family-variable vectors for G G S, the task, more or less, reduces 
to the question of finding a subset S' C S of cardinality \T\ = n ■ (2” _1 — 1) 
such that the vectors { G R r ; G G S'} are affinely independent. 

We accept a standardization convention that valid inequalities for vectors 
77 € F in the family-variable polytope will be written in the upper-bound form: 

(o, 77 ) < u where o G R r is an objective and uGlan upper bound. (5) 

Note that any lower-bound inequality (o', 77 ) > l can be replaced by (o, 77 } < u 
where o = —o' and u = —l. Since F is a rational polytope, its facets are 
defined by inequalities with rational coefficients, that is, by (5) with o G Q r . 
By multiplying it by a suitable positive factor one can get (unique) integer 
vector objective o G Z r whose components have no common prime divisor. 
Since the vertices of F are zero-one vectors, the tight upper bound in (5) must 
be then an integer as well: «GZ. 

Moreover, a couple of special extension conventions for vectors in R r and 
R-' 1 will be accepted to simplify some later formulas: 

— for every objective o G M r , assume o(b | 0) = 0 for any b G N, 

— for any m G R 71 , put m(S) = 0 for S C N, IS) < 1. 


4 Observations on facets of the family-variable polytope 

In this section, we present a few general facts concerning faces and facets of F 
and describe explicitly those facets which contain the empty graph. Note that 
some of these basic observations are also mentioned and used in a parallel 
paper [10]. We keep the standardization convention from § 3. A basic division 
of facet-defining inequalities is on the basis of the upper bound value u. 

Lemma 2 Assume that (5), that is, the inequality ( 0 , 77 ) < u with o G R r 
and u G R, is a valid inequality for all 77 G F. Then u > 0. 

(i) One has u = 0 iff the corresponding face of F contains the empty graph. 

(ii) If u = 0 then the objective coefficients are non-positive: 

o(a | B) < 0 for each (a\B) G T. 

(iii) The facet-defining inequalities tight at the empty graph are just 

—r](a | B) < 0 for each (a \ B) G T. ( 6 ) 

(iv) If (5) is a facet-defining inequality for F with u > 0 then the objective 
coefficients are non-negative and increasing in the following sense: 

o(a | B) > o{a \ A) > 0 whenever a G N and %^AQBGN \ {a}. 

Note that an alternative proof of Lemma 2(iv) is in [10, Propositions]. 
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Proof The zero vector in R r is the code for the empty graph and, therefore, 
belongs to F. The substitution of rj = 0 into (5) gives 0 < u. It is clear that 
the inequality is tight for rj = 0 iff u = 0 , which gives (i). 

As concerns (ii), assume for a contradiction that (a \ B) £ T such that 
o(a | B) > 0 exists in (5) with u = 0. Consider G £ DAGS (N) with t]g = I a ^B- 
Then (o, t]g) = o(a \ B) > 0 = u contradicts the validity of (5). 

As concerns (iii), an elementary fact is that, for every (b\D) £ T, all the 
inequalities in (6) with (a \ B) ^ (b\ D) are tight for the family-variable vector 
r) = Ibi-D G F but not the inequality corresponding to (b \ D). This allows us 
to observe that any inequality in (6) is facet-defining for F. Indeed, any such 
inequality is valid for F and, having fixed (a | B) £ T, the respective inequality 
— rj(a \B) < 0 is tight for |T| affinely independent vectors, namely the zero 
vector in R r and vectors h^D for (b \ D) ^ (a|J3). The second step is to 
show that every facet F of F containing the empty graph is defined by (6). 
Former observations (i) and (ii) imply that the facet-defining inequality for 
F must have the form ( 0 , 77 ) < 0 with o £ (—cx),0] r . Thus, the inequality 
is a conic combination of those from (6). Since F is assumed to be a facet, 
Corollary 1 can be used to show that at most one coefficient in the combination 
is non-zero. Indeed, if two coefficients o(a \ B ) and o(b \ D) are non-zero, the 
above elementary fact implies for the inequalities o(a | B) ■ r](a \ B) < 0 and 
S( c | E)Ma | b) °( c I -®) ' 7 ?( c I E) < 0 that the condition (4) from Corollary 1 is 
fulfilled with w\ = I a <-B and w 2 = /&<_£>. On the other hand, at least one 
coefficient must be non-zero, since otherwise F = F. Therefore, the facet F 
must be defined by one of the inequalities in (6). 

As concerns (iv), owing to the extension convention from § 3, the statement 
means o(a \ B) > o(a | A) for a £ N and A C B C N \ {a}. Assume for a 
contradiction that a £ N and A C B C N \ {a} exist with o(a | B) < o(a \ A) 
and define 0 £ R T in the following way: 


o(b\D) := 


o{b\D) for (b\D) £ T, (b\D) ± (a\B), 
o(a | A) for (b \D) = (a \ B). 


The next observation is that ( 0 , 77 ) < u is a valid inequality for F. Specifically, 
given G £ DAGS (AT), construct G £ DAGS (N) such that 77^(0 | B) = 0 and 
( 0 , 77 c;) = Indeed, if pa G (a) ^ B then simply G := G , otherwise put 

pag(a) = A and pag,(6) = pa G (&) for b £ N \ {a}, which gives 


(5, 77 G ) ~ (o, 77 G ) = o(a | B) - o(a \ A) = o(a | A) - o(a | A) = 0 . 


The definition of 5 implies (5, 77 ^) —(o, t]q) = {o(a | B)—o(a \ B)}- 77 G (a | B) = 0. 
Because (5) is valid for t]q one can observe 

( 0 , pc) = (5, 77 q) = (o, t]q) < u , which was desired. 

Thus, (5) is the sum of the valid inequality (5, 77 ) < u with a positive multiple 
of the valid inequality — 77(0 | B) < 0, namely by /? := o{a \ A) — o(a | B) > 0. 
The condition (4) from Corollary 1 is fulfilled with w\ = 0 and W 2 = I a <-B , 
which implies a contradictory conclusion that (5) is not facet-defining. 
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This implies the following observation. 

Corollary 2 Let S be a facet of F in the sense of Definition 2 which does not 
contain the empty graph. Then S is closed under super-graphs in the sense: 

if G £ S is a subgraph of H £ DAGS ( N ) then H £ S. 

Moreover, for every (a | B) £ Y, there exists G £ S with pa G (a) = B. 

The second statement in Corollary 2 is also derived in [10, Proposition 4] 
using slightly different arguments. 

Proof It is enough to verify the first claim in the case H differs from G just in 
just one parent set, that is, in case a £ N exists with A = pa G (a) C pa H (a) = 
B and pa H {b) = pa G {b) for b £ N \ {a}. By Lemma 2(i), we know that S is 
given by the inequality (5) with u > 0. Thus, by Lemma 2(iv), one can write 
(o, t)h) — (o, rj G ) = o(a \ B) — o(a \ A) > 0. Assuming G £ S, the inequality (5) 
is tight for rj G and one has 


u = (o,r/ G ) < (o,r]H) < u because (5) is valid for rjH- 


Hence, (o,tih) = u , that is, (5) is tight for r]H, saying that if £ S. 

As concern the second claim assume for a contradiction that (a | B) £ Y 
exists with pa G (a) ^ B for any G £ S. That means, S is contained in the face 
defined by — rj(a \ B) < 0. Since conv({? 7 G £ R r : G £ S}) is a facet of F, 
by Lemma 1, observe that it coincides with the face defined by — 77(0 | B) < 0. 
This implies a contradictory conclusion that S contains the empty graph. 

An obvious modification of natural convexity constraints gives the following 
valid inequalities for the family-variable polytope: 



r](a | B) < 1 for any a £ N. 


( 7 ) 


B : 0/BCJV\{q} 


Except for a degenerate case n = 2, these inequalities are facet-defining; see 
also [10, Propositions]. 

Lemma 3 If n > 3 then, for every a £ N, (7) defines a facet of F. 

Proof We find |Tj affinely independent vectors on the face. Specifically, for 
0 ^ B C N \ {a} put p( a | while for b £ N, b ^ a and (b\D) £ Y 

put ri(b\D) = Ia<-N\{a,b} + These vectors linearly generate R r . Hence, 

they are linearly independent, and, therefore, affinely independent. 
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5 Score equivalence concept 

The score-based approach to structural learning Bayesian networks consists 
in maximization of a function G £ DAGS (N) H > Q(G,D), where D is the 
database of observed values and Q a suitable quality criterion, also called 
a scoring criterion [14, p. 437], which evaluates how the graph G fits the 
database. The criteria used in practice turn out to be affine functions of the 
family-variable vector, that is, Q(G, D) = k+ (o, 77 g)t with k £ R and o £ R r 
encoding both D and Q. Thus, the learning task turns into an LP problem to 
maximize a linear function over the vertices of the family-variable polytope F. 

Since the goal is typically to learn the structure, described by a Markov 
equivalence class of graphs, most of criteria used in practice do not distinguish 
between Markov equivalent graphs, that is, one has 

Q(G , D) = Q(H , D ) whenever G and H are Markov equivalent. 

In the machine learning community, quality criteria satisfying the above condi¬ 
tion are called score equivalent [3,6]. This motivates the following terminology. 

Definition 3 (score equivalent objective) 

We say that a vector o £ R r is a score equivalent objective (abbreviated below 
as an SE objective ) if it satisfies 

VG,H £ DAGS (N) G ~ H =► ( 0 , 77 c) = {o,r] H ) ■ ( 8 ) 

Clearly, the set of SE objectives is a linear subspace of R r . 

The faces and facets of F are defined in terms of normal vectors, which 
leads to the following concept. 

Definition 4 (SE face/facet, closed under Markov equivalence) 

We will name a face F of F score equivalent (SE) if there exists an SE objective 
o £ R r and a constant such that two conditions from Definition 1 hold 

for P = F. By an SE facet is meant a facet of F which is an SE face. 

A related concept is the next one: a set S C DAGS (N) of acyclic directed 
graphs is closed under Markov equivalence if 

\JG,H £ DAGS (N) G~H G£ S =>■ H £ S. (9) 

Remark 1 Note that an objective determining a face is not uniquely deter¬ 
mined. Only in the case of a facet (of a full-dimensional polytope), is it unique 
up to a positive multiple. Therefore, one has to be careful when testing score 
equivalence of a face F which is not a facet, because one of the face-defining 
objectives for F could be SE and another objective for F need not be. Our 
definition requires the existence of at least one SE objective defining the face. 

The following observation is straightforward. 
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Lemma 4 A set of graphs on an SE face is closed under Markov equivalence. 

Proof Given an SE objective o with F = { rj £ F : (o,rj) = u} for some u £ R 
and G £ DAGS ( N ) with (o, r]c) = u, (8) implies for H ~ G that (o, tjh) = u- 

An open question is whether the converse is true. 

Conjecture 1 

Every face S C DAGS ( N ) of F closed under Markov equivalence is an SE face. 

We managed to confirm the conjecture for facets; see Theorem 1 in § 7. 
The arguments there are slightly special and do not apply to general faces. 
However, we were able to verify Conjecture 1 for n = |iVj = 3 by an exhaustive 
analysis. By means of a computer, we verified for n = 4 that every inclusion- 
submaximal face among those closed under Markov equivalence is already an 
SE face. Our computational attempts to find a counter-example for n = 5 have 
not been successful. 


6 SE objectives characterization 

Recall that to present the characterization of the linear space of SE objectives 
in an elegant way we use the extension conventions from § 3. 

Lemma 5 A vector o £ R. r is an SE objective if and only if either of the 
following two conditions holds. The two conditions are equivalent: the first 
holds if and only if the second does. 

(a) For any Z C N and a,b £ N \ Z , a ^ b one has 

o(b | {a} U Z) + o(a | Z) = o(a | {b} U Z) + o{b \ Z). (10) 

(b) There exists m £ such that 

o(a | B) = m({a} U B) — m(B) for any a £ N, B C N\ {a}. (11) 

In particular, the dimension of the linear subspace of SE objectives is 2 n — n— 1. 

Proof The condition (8) for o £ K r means (o, r]G~ t]h) = 0 if G,H £ DAGS ( N ) 
are such that G ~ H. A well-known transformational characterization of 
Markov equivalence [5, Theorem 2] says that G ~ H if and only if there 
exists a sequence G = G i,..., G m = H, m > 1 in DAGS (N) such that, for 
i = 1,..., m — 1, the graph G,+i is obtained from Gi by “covered arc rever¬ 
sal”. This means that Gi has an arrow a b with pa G .(b) = {a} Upa G .(a) 
and Gj+i is obtained from Gi by replacing a —> b in Gi by b —>• a in Gj+i; 
the remaining arrows are unchanged. In particular, Gi ~ Gj+i and, provided 
Z = pa G . (a) one has 


-Z Ia^{b}UZ I) 


b<-Z- 


VGi - VGi+i = h<-{a}UZ + It 
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Hence, we easily derive that (8) holds for o £ R r iff (10) holds. 

It remains to show that (10) is equivalent to the existence of m £ R A such 
that o is given by (11). The sufficiency of (11) is easy: then both LHS and 
RHS in (10) have the form m({a, b} U Z) — m(Z). 

The necessity of (11) can be shown by an inductive construction. Take 
Z = 0 in (10) and get o(b | {a}) = o(a | {b}). One can put m({a, 6}) := o(b | {a}) 
for any pair of distinct a,b £ N. Thus, owing to the above conventions, (11) 
holds in case \B\ < 1. To confirm (11) for B with \B\ = r > 2 accept the 
inductive hypothesis that it holds for B' with \B'\ < r — 1. The task is to 
define m(D) for D C N with \D\ = r + 1 so that (11) holds for B with 
\B\ < r. Having fixed such a set D , for any pair of distinct elements a,b £ D 
put Z = D \ {a, b} and observe from (10) by means of the inductive premise: 

o(b | {a} U Z) + m({a} U Z) — m(Z) = o(a | {6} U Z) + m({b} U Z) — m(Z). 

The cancellation of m(Z) implies the function b o(b | H\{fe})+m(Zl\{fe}) for 
b £ D is constant on D. Thus, one can put m(D) := o(b \ D \ {6}) + m{D \ {6}) 
for any such b £ D 1 which verifies the inductive step. 

The correspondence between o and m in (11) is evidently a one-to-one 
linear mapping, which implies the claim about the dimension. 

Corollary 3 Let o £ R r be an SE objective and let m £ R A satisfy (11). 
Then for any T £ A and arbitrary b £ T with R := T \ {b} one has 

£ (— l)^-o{b\K)= Y, (-1 ) ini| -m(L). (12) 

(H^KCR L&A-.LCT 

In particular, the LHS of (12) does not depend on the choice of b £ T. 

Proof Having in mind the extension conventions from §3 write using (11): 

£ (-1)1^1 • o(b I K) = (-1)1*1 • £ (-l)'*l • o(b I K) 

Q^kcr k<zr 

= } (-l) |i?l • £ (-1) 1 * 1 • { m{{b} UK)- m(K) } 

KCR 

= (-1)1*1 • (-1) ■ £ (-1)' K I +1 • m({b} U K) 

KCR 

+ (- 1 ) 1 * 1 . (- 1 ). £(- 1 ) 1*1 .m(K) 

KCR 

= (-l) |T| -£(-l) |i| -m(L)= £ (-1)'^'-m(L), 

LCT LeA-.LCT 

which concludes the proof. 

Another relevant observation is the following. 

Lemma 6 Any face of F containing the whole Markov equivalence class of 
full graphs is given by an SE objective. 
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Proof Assume (o, rj) < u is an arbitrary defining inequality for such a face F 
of F, with o £ R r , u £ R. By Lemma 5(a), it is enough to show o satisfies 
(10). Note that, for any Z C N and distinct a,b £ N\Z, full graphs G and 
H over N exist with r] G ~ Vh = h-f-{ a }uz + - 4<-{6}uz - h^z- Hence, 

(o, rjc) = u = (o, tjh) implies that (10) is true for that particular a, b and Z. 

It follows from Lemma 6 that every face of F which contains the class of 
full graphs is an SE face. In particular, no counter-example to Conjecture 1 
is among the faces containing a full graph. Indeed, since they must be closed 
under Markov equivalence, they necessarily contain the whole set of full graphs. 


7 Correspondence to supermodular functions 

In this section we characterize those facets of F which contain the set of full 
graphs. We show they coincide with SE facets and establish their relation to 
extreme supermodular functions. 

The previous results allow us to confirm Conjecture 1 for facets. 

Theorem 1 The following conditions are equivalent for a facet S C DAGS (N): 

(a) S is closed under Markov equivalence, 

(b) S contains the whole equivalence class of full graphs, 

(c) S is SE. 

Proof To show (a)=>(b) note, by Lemma 2(iii), that S cannot contain the 
empty graph, since otherwise it is not closed under Markov equivalence. Clearly, 
S must be non-empty, because otherwise it is not a facet of F. Thus, G £ S 
exists and one can construct a full graph H £ DAGS ( N ) such that G is a sub¬ 
graph of H. By Corollary 2, H £S. Since S is closed under Markov equivalence, 
all full graphs belong to S. The implication (b)=>(c) follows from Lemma 6. 
The implication (c)=>(a) was mentioned as Lemma 4 in § 5. 

The next step is to recall the definition of a supermodular set function. 

Definition 5 (standardized supermodular function) 

Any vector m £ can be viewed as a real set function m : V(N) —> R. 

Such a set function will be called standardized if m(S) = 0 for S C N, |Sj < 1, 
and supermodular if 

VU,V C N m(U) + m(V) < m(U UF) + m{XJ fl V). (13) 

The following (non-negative) characteristics are ascribed to any supermodular 
function m: for any a,b £ N , a ^ b and Z £ N \ { a , b}, we will denote 

Am{a , b\Z) := m({a, b} U Z) + m(Z) — m({a} U Z) — m({b} U Z). 

It is easy to see that a set function m is supermodular iff Am(a , b \ Z) > 0 
for any respective triplet (a,b\Z); see, for example, [23, Theorem 24(iv)]. 
The point is that standardized supermodular functions correspond to valid 
inequalities for the family-variable polytope that are tight at all full graphs. 
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Lemma 7 An inequality (o, if) < u, where o £ R r and u £ R, is valid for all 
r) £ F and tight at any full graph over N iff it corresponds to a standardized 
supermodular function to in the sense: 

— o is given by (11): o(a \ B) = m({a } U B) — m(B) for a £ N, B C N \ {a}, 

— u is the shared value ( o,i)h) for full graphs H over N. 

Moreover, the correspondence is one-to-one and preserves a conic combination. 

Proof Given such an inequality, Lemma 6 implies that o is an SE objective 
and Lemma 5(b) says it has the form (11). To show that m is necessarily 
supermodular observe Am(a,b\ Z) > 0 for any (a,b\Z). To this end, note 
that, given a triplet (a,b\Z), a full graph H over N and G £ DAGS (TV) exist 
such that r)H—r)G = 4^{a(uz _ 4<-z- Indeed, consider a total order of elements 
in N in which Z precedes a after which b and N\({a, b}UZ) follow and take H 
as the full graph consonant with this order and G is the graph obtained from 
H by the removal of the arrow a b. Hence, ( 0 , 1 ) 0 ) < u = (o,i)h) implies 

0 < (o, i)h — i)g) = o(b | {a} U Z) — o(b \ Z) *'='* Am(a, b \ Z). 

Conversely, given a supermodular to, Lemma 5(b) says the objective o 
given by (11) is SE and the full graphs H over N share the value ( o,i)h)■ 
Thus, it is enough to show that, for any G £ DAGS (AT), a full graph H exists 
with (o, i)g) < (o, i)h )■ Indeed, consider a total order consonant with G , denote 
by pre(a) the set of (strict) predecessors of a £ N in that order and by H the 
full graph consonant with the order. Then write 

(o, 1) H - i) G ) = ^2 { °( a I P re ( a )) - °( a I P a c ( a )) } = 

a£N 

= y { m({a} U pre(a)) — m({a} U pa G (a)) — m(pre(a )) + m(pa G (a )) } > 0 . 

— ' -^ ' 

Since (11) defines an invertible linear transformation, the last claim is easy. 

By the extension convention from § 3, any vector in to £ M 71 could be 
identified with a standardized set function. With a small abuse of terminology, 
we say that to £ is supermodular if its zero extension to : V(N) —>• R is a 
supermodular set function. By its definition, the set of supermodular vectors 
in is a polyhedral cone. Since it is pointed, it has finitely many extreme 
rays. This motivates the next definition. 

Definition 6 (extreme supermodular function) 

A standardized supermodular set function to : V(N) —>• R is called extreme if 
it generates an extreme ray of the standardized supermodular cone. 

The following fact follows from a specific characterization of extremality of 
supermodular functions. 

Lemma 8 Let mi,m 2 £ R 73 ^ generate distinct extreme rays of the stan¬ 
dardized supermodular cone. Then the faces of F determined by the corre¬ 
sponding inequalities, as described in Lemma 7, are inclusion-incomparable. 
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Proof The argument is based on the result saying that a supermodular set 
function to is extreme iff the structural independence model produced by to 
is sub-maximal; see [18, Lemma 5.6] or [23, Corollary 30]. More specifically, it 
says to is extreme iff any supermodular function m' with 

V (a, b\Z) Am(a,bjZ) = 0 =>■ Am'(a, b \ Z) = 0 

either satisfies, for any triplet (a, b\Z), Am'{a, b \ Z) = 0 Am(a, b\Z) = 0 
or even Am'(a, b \ Z) =0, which, for a standardized m ', means that m! must 
be a non-negative multiple of m. Since mi, m 2 generate distinct rays, a triplet 
(a, b | Z) must exist such that Amffa , b \ Z) >0 and Am, 2 (a , b \ Z) = 0. As in 
the proof of Lemma 7, construct a full graph H over N and G £ DAGS (N) 
with t/h — t)g = Ib^{a}uz ~ h-i-z- Then Amffa, b\Z) >0 implies that the 
inequality ( 01 , 77 ) < u\ given by mi through (11) is not tight for rjc because 

Ui-{oi,t]g) = ( 01 , 77 ^- 770 ) = 0 i(b\{a}UZ)- 0 i(b\Z) ( = ) Amffa,b\Z) > 0, 

while Am 2 (a,b\ \ Z) = 0 implies that ( 02 , 77 ) < U 2 is tight for 77 q. Hence, the 
face of F determined by m 2 is not contained is the one determined by m\. The 
role of generators mi and m 2 is clearly exchangeable. 

Now, thanks to Theorem 1(b), we are ready to characterize SE facets. 

Theorem 2 An inequality (o, 77 ) < u for 77 £ F, where o £ R r and u £ R, 
is facet-defining for F and tight at all full graphs over N iff there exists an 
extreme standardized supermodular set function m such that o is determined 
by (11) and u is the shared value of (o,pn) for full graphs H over N. 

Proof First, using Lemma 1, we show that any extreme standardized super- 
modular function m, gives a facet of F. Thus, assume F' is a face containing 
the face Fj, determined by m,. Lemma 7 applied to m; says that the face F t 
contains the class of full graphs, and so F' does. Again by Lemma 7 applied 
to the inequality defining F' , the face F' is given by a supermodular function 
m ', which must be a conic combination of finitely many generators of (all) 
the extreme rays: m' = y~h oy • mj, ctj > 0. The assumption iq C F' implies 
that, for any k i, the coefficient must vanish. Indeed, by the last claim 
in Lemma 7, ak > 0 forces F' C : the inequality defining F' is a conic 
combination of the inequality corresponding to m*, (= defining F *,) and the 
inequality corresponding to a j ' m i an d using th e arguments in the proof 

of Corollary 1 observe F' C Fk- However, for distinct i and k. the respective 
faces Fi and F are inclusion-incomparable, by Lemma 8. Thus, either m' = 0, 
in which case F' = F, or m! a positive multiple of m $, in which case F' = F. t . 

Second, we show that any facet F of F involving all full graphs is given an 
extreme standardized supermodular function. Apply Lemma 7 to F and write 
the respective standardized supermodular function to as a conic combination 
to = y~b aj ■ mj , aj > 0 of extreme ones. Let us assume for a contradiction 
that ai 0 ak for distinct i and k. By Lemma 8 , the faces corresponding to 
TO* and TO*, are incomparable. In particular, provided ( 0 ^, 77 ) < Uj denotes the 
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inequality for 77 £ F corresponding rrij , we know that w\ £ F exists satisfying 
(oi, w\) = Ui and (ok, w±) < Uk, and W2 £ F exists satisfying (oi, W2) < iq. The 
inequality corresponding to m is the sum of a j ' (°ji r l) — ' u j 

and of the cq-rnultiple of the inequality (oi,rj) < 'tq. The assumption (4) of 
Corollary 1 is fulfilled for the vectors w\ and W 2 above, which gives a contra¬ 
dictory conclusion that F is not a facet. Thus, at most one of the coefficients 
aj is non-zero. Since m must be non-zero, it is a positive multiple of some rrij. 

Thus, Theorem 2 transforms the problem of testing certain facets of F into 
the task to verify whether the respective supermodular function is extreme. 
Note that a simple linear criterion for testing extremality of a standardized 
supermodular function m has recently been proposed in [23]. The criterion 
consists in solving a linear equation system determined by the combinatorial 
structure of the so-called core polytope ascribed to m: 

C{m) := { [u Q ] a<E jv : Y, v a = m(N) & VS C N ^ u a > m(S) } . 

aeN aeS 

We hope that the criterion from [23] will appear to be useful in our context. 


8 Generalized cluster inequalities and uniform matroids 

An important class of inequalities for the family-variable polytope is discussed 
in this section. We apply Theorem 2 from the previous section to show they 
define SE facets and reveal their hidden connection to uniform matroids. 

Jaakkola, Sontag, Globerson and Mcila introduced in [12] an interesting 
class of cluster-based inequalities for F, whose purpose was to express the 
acyclicity restrictions. To shorten the terminology we call them the cluster 
inequalities. Specifically, if the family-vector r/c encoding G £ DAGS ( N ) is 
extended by additional components for the empty parent sets rjaifl \ 0), a £ N, 
then the inequality ascribed to a cluster C C. N, \C | > 2, has the form 

1 < Vc{a\B). 

aec BGN : SDC—0 

The interpretation is clear: since the induced subgraph Gc is acyclic, there is 
at least one node a in C which has no parent in C. An important fact is that 
the only integral vectors in the polyhedron specified by the cluster inequalities, 
and, for any a £ N, by the convexity constraints rjc (a \ B) >0, BCiV\ {a} 
and Y^B<ZN\{ a } t Ig(0' \ B) = 1, are the DAG-codes [21, Lemma2]. 

The cluster inequalities have appeared to have a crucial role in the integer 
linear programming (ILP) approach learning BN structure. This was confirmed 
computationally in [ 8 ] by the first author of this paper, who also introduced 
generalized cluster inequalities. Specifically, to every cluster CC1V, [Cl > 2, 
and fc = 1 ,..., |C| — 1 one can ascribe the inequality 

k< Y Vc{a\B). 

aec BCN\{a} : \BnC\<k 
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Its interpretation is analogous: since the induced subgraph Gc is acyclic, the 
first k nodes in a total order of nodes in C consonant with Gc have at most 
k — 1 parents in C. Note that for k = |Cj and k = 0 the inequalities are 
tight at any G £ DAGS (N) and are, therefore, omitted. In particular, we only 
consider the (generalized) cluster inequalities for k = 1,..., \C\ — 1; this also 
enforces \C\ > 2. To transform them into standardized inequality constraints 
on a vector 77 in F C R r we use the above convexity equality constraints and 
get for any C C N, \C\ > 2, and k = 1,..., \C\ — 1, 

£ £ v(a\B) <\C\-k. (14) 

oSC BCN\{a} : |BnC|>fc 

The point is that this fc-cluster inequality (14) corresponds to an extreme 
standardized supermodular set function in sense of Theorem 2. 


Lemma 9 For any C C N, \C\ > 2, and k = 1,..., \C\ — 1, the formula 

mc,k{S) = max { 0, \S PI C\ — k } for any S C N, (15) 

gives an extreme standardized supermodular function which determines through 
the formula (11) the objective coefficients in (14). 

Proof Easily, the objective coefficient for (a | B) £ T is 


oc,k(a | B) ( = } mc,fc({a} U B) - m c ,k{B ) = 


1 if a £ C and \BC\C\ > k, 
0 otherwise, 


and the value of (oc.fc, Vh) for any full graph H over N is |Cj — k. Hence, (15) 
determines through (11) the inequality (14). 

It remains to show that mc,k generates an extreme ray of the cone K of 
standardized supermodular functions. Recall m is supermodular iff, for any 
triplet A, B, Z C N of pairwise disjoint sets, one has 


Am(A , B\Z) := m(A U B U Z) + m(Z) — m(A U Z) — m(B U Z) > 0 , 

which is a re-formulation of (13), but it enough to verify Am(a , b \ Z) > 0 for 
any a, 6 £ TV, a ^ b and Z C N\ {a, b}. It is easy to observe m(S) > 0 for any 
to £ K and S C N. Since mc,k{S) = mc,k(S H C) for any S C N, one has 

Amc,k(A , B | Z) = Arric,k(A n C, B n C \ Z n C) for disjoint A,B,ZC N. 

To show mc,k £ K observe that, for any triplet (a, b \ Z) with {a, iJllZCC, 

Amc,k(P"> b | Z) = 1 if |{a, 6 } U Z\ = k + 1, and 
Amc,kifl , b | Z) = 0 otherwise. 

We have to verify that, if mc,k = a-mi + (1 — a) ^ 2 , a £ (0,1) is a non-trivial 
convex combination of toi, m 2 £ K then toi and m 2 are non-negative multiples 
of mc,k- To show to = 7 • mc,k for some 7 > 0 it is enough to verify: 

(i) m(S) = 0 for S C TV with \S n C\ < k, 
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(ii) m(S) = m(S 0 C) for any S C TV, 

(iii) m(S) = m(T) for S,TCC , 151 = |T| = k + 1, 

(iv) if 7 is the shared value from (iii) then m(S) = 7 + m(R) for any R, S C C, 
such that RC S and k < |i?| = |5| — 1. 

To verify (i) for mi, m 2 with some such S C N write 

0 = mc,k{S) = a ■ mi(S) + (1 — a) ■ m 2 {S). 

The RHS here is a convex combination of non-negative terms; therefore, they 
both vanish, which means 0 = mi(S) = m 2 (S). To verify (ii) for mi,m 2 with 
some S C TV consider (A, B \ Z) = (S fl C, S \ C | 0) and observe 

0 = Amc,k{A-, B | Z) = a ■ Ami(A, B \ Z) + (1 — a) ■ Am 2 {A 1 B\Z). 

Hence, for i = 1,2, Arrii(A , B \ Z) = 0, implying together with (i) for rrii that 
rrii(S) = rrii(S nC). To verify (iii) it is enough to observe rrii(S) = rrii(T) in 
case |5| = |T| = £; + l with S\T = {s} and T\S = {£}. Choose r £ SdT, put 
R = (S fl T) \ {r} and consider the triplets (r, t \ R U {s}) and (r, s \ R U {£}). 
Since both 0 = Amc,k(f , t\RU {s}) and 0 = Amc,k(r, s \ R U {£}), one has 
0 = Arrn(r, t \ RU {s}j = Am,i(r, s \ f?U {£}), for * = 1,2. Hence, by (i) for rrn, 

0 = Arrii{r , t \ R U {s}) — Arrii{r, s \ R U {£}) = m^(T) — rrii(S). 

The condition (iv) can be verified by induction on |5|: (i) and (iii) for rrii say 
(iv) holds for |5| = k + 1. If |5| > k + 1 and S\R= {s} choose t £ R and put 
T = S \ {£}. Because 0 = Amc,k{s , t \ R (~l T) one gets 0 = Am,i{s, t \ R PI T), 
that is, rrii(S) — rrii(R) = rrii(T) — rrii(RC\T) = 7 by the inductive assumption. 

Corollary 4 Any generalized cluster inequality (14) defines an SE facet of F. 

Proof Combine Lemma 9 with Theorems 2 and 1. 

The rest of this section is an observation which makes sense for a reader 
familiar with elementary notions in the matroid theory. Thus, we assume the 
reader knows basic equivalent definitions of a matroid in terms of independent 
sets, bases and the rank function, as given, for example, in [16, Chapter 1], 

The link between generalized cluster inequalities and certain matroids is 
based on a duality relationship of supermodular functions and their mirror 
images, submodular functions. In fact, there is a one-to-one linear mapping 
from the cone K of standardized supermodular functions onto the cone of 
submodular functions r : V{N) —>• R satisfying r(0) = 0 and r(N) = r(N\{a}) 
for any a £ N. The point is that the rank functions of non-degenerate matroids 
fall within this submodular cone. Specifically, one can consider the duality 
transformation which ascribes to any m £ K the set function r given by 


r(T) = m{N) — m{N \ T) for any T C N. 
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This self-inverse transformation maps the supermodular function mc,k for 
CCN, \C\ > 2, and k = 1 , ..., \C\ — 1 , onto the submodular function 

r C ,k(T) = min { \T fl C\, \C\ — k } for any T C N, (16) 

which is the rank function of a matroid on N. However, it can be viewed as a 
kind of trivial “loop-adding” extension of a matroid which has C as its ground 
set. Indeed, the function (16) can be identified with its restriction to V(C), 
which is the rank function of the uniform matroid of rank \C\ — k on C; see 
[16, Example 1.2.7]. The bases of this matroid are just the subsets of C of the 
cardinality |C| — k. Two remaining uniform matroids on C, namely those of 
the ranks 0 and |Cj, differ in the property they are not connected: that means 
a set 0 C S C C exists with r(C ) = r(S) + r{C \ S), where r is their rank 
function; see [16, §4.2] for this concept. Therefore, one can summarize our 
observation by saying that the generalized cluster inequalities for C C TV are 
in a one-to-one correspondence with connected uniform matroids on C. 

Remark 2 Note that the duality transformation is not the only one-to-one 
linear mapping between the considered supermodular and submodular cones; 
see [23, § 7.2] for the details. However, this fact is not important in our context 
since the use of the other transformation leads to the same conclusion, the 
difference is that the uniform matroid on C of the rank k is ascribed to mc,k 
instead. On the other hand, the duality transformation has the property that 
the vertices of the core polytope ascribed to mc,k, as defined in the end of § 7, 
are just the incidence vectors for bases of the uniform matroid of rank \C\ — k. 


9 On the faces of the characteristic-imset polytope 

In this section, we introduce a one-to-one correspondence between faces of the 
characteristic-imset polytope C and SE faces of the family-variable polytope F. 
This allows us to characterize those faces of C that correspond to SE facets. 

Let (z, c)a < u, where z £ R ' 1 and u £ R, be a valid inequality for c in the 
characteristic-imset polytope C. It defines a face of C: 

F = { c £ C : {z,c) A = u} . 

By substituting (1) into the inequality {z, c v )a < u and re-arranging terms 
after the components of r] one gets an inequality for r] £ R r valid for any 
r/G, G £ DAGS (N). Indeed, this is because the image of r/c by (1) is just cq- 
Moreover, the objective on the LHS of the obtained inequality is SE because 
whenever G ~ H, one has cq = c g and, therefore, {z, c g)a = (z, zh)a- Thus, 
any face of C defines an SE face of F. Nevertheless, the converse is true. 

Lemma 10 Given an SE objective o £ R r , there exists unique z a £ M - ' 1 such 
that the following holds: 

V 77 £ R r {o,r]) r = {z 0 ,c v )a- 


(17) 
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Specifically, one has 

z 0 (T):= (-1 )^-o{b\K) (18) 

®^K<ZR 

for T £ A, with any b £ T and R := T \ {6}. 

In particular, the expression in (18) does not depend on the choice of b £ T. 
Proof We are going to show that z a £ M 71 given by (18) satisfies 

VGe DAGS (N) (o, rjG)r = (z 0 , c G )a • (19) 

By Corollary 3, we know that z a takes the form 

Zo (T) ( = } Y (-l) lni| ■ m(L) for T £ A; (20) 

LgA-.LCT 

where m £ R 71 given by (11). 

The next step is to note that (20) is equivalent to the relation 

m(S) = z 0 (T) for any S€d, (21) 

tgA-.tcs 

which can be verified by substituting (20) into the RHS of (21). To verify (19) 
substitute (11) into the expression for ( o,t]g)y, then use the definitions of tjg 
and that of the standard imset uq: 

(o,??G)r ( = ) Y { m (M u B) - m(B)} ■ r) G (a\B) 

(a | B)er 

= Y E Vc(a\B) ■ S s {{a}U B) - Y Vc(a\ B) ■ 6 S (B)\ 

0/SCAT [(a|B) (a | B) ) 

= Y m ^ ' { E ^({a} u pa G (a)) - Y 6 s(pa G (a )) 1 

0/SCAT l a£N a£N ) 

= E ">(«)• E { ^{a}Upa G (a) {S) <5pa G (a) (S)} 

0/SCjV a£N 

= E r,i(S) ■ {S N (S) - u G (S)} . 

0/SCAT 



Polyhedral aspects of score equivalence 


21 


Further, we substitute the relation (21) into the above expression and get this: 
(■ o,rj G )r= E rn(S) ■ {5 N {S) - u G {S)} 

(fi^SCN 

( = E E Z °( T )' IMS) - u e(S)} 

ScA TcA-.TCS 

= E Z °P)- E {Sn(S) - u G (S)} 

tga s-.tcscn 

= E *o(T) • {1 - E M-?)} = E z o( T ) ■ c g(T) = {Zo, c a)A ■ 

TcA s-.tcscn TcA 

Since the codes t]g for G G DAGS ( N ) linearly span R r the relation (19) implies 
(17). The uniqueness of the vector z 0 in the formula (17) is easy because the 
codes c a for G G DAGS (N) span R A . 

Every face of the characteristic-imset polytope C can be identified with a 
set of acyclic directed graphs closed under Markov equivalence: 

F C C a face of C e—> S = {G G DAGS ( N ) : cq G F} . 

Indeed, the arguments given above Definition 2 are also valid for P = C and, 
since the vertices of C are just the characteristic imsets, its faces can be viewed 
as sets of characteristic imsets. These, however, correspond to equivalence 
classes of graphs over N. Thus, every face of C can be identified with a set 
of such graphs, namely with the union of the respective equivalence classes. 
These are just the graphs whose characteristic imsets belong to the face. It is 
easy to see that the correspondence preserves inclusion: F\ C E 2 for faces of 
C iff Si C S 2 for the corresponding sets of graphs S; C DAGS ( N ). 

Corollary 5 There is a one-to-one correspondence between SE faces of F 
and faces of C which preserves inclusion: given SE faces F \, Fi of F and the 
corresponding faces F\ , F 2 of C one has F\ C F 2 if and only if F\ C F 2 ■ 
Specifically, the SE face of F given by an inequality ( o,r])r < u corresponds 
to the face of C given the inequality (z 0 , c)a < u- This correspondence has the 
property that the sets of graphs identified with the faces coincide. 

Proof It is easy to see that ( 0 , 77 ) 7 - < u is valid for 77 G F iff {z 0 ,c)a < u is 
valid for c G C. Moreover, by Lemma 10, the set of G G DAGS ( N) such that 
(°, v)r < u is tight for t]g coincides with the set of G G DAGS ( N ) such that 
(z 0 ,c)a < u is tight for cq- Thus, an SE face of F and the corresponding face 
of C have the same sets of “belonging” graphs. This observation easily implies 
the claim about preserving the inclusion of faces. 

There are two distinguished vertices of the characteristic imset polytope C. 
One of them is the 0-imset, the zero vector in R 71 , which is the characteristic 
imset of the empty graph over N. The other one is the 1-imset, a vector in 
M' 1 whose all components are ones, which is the characteristic imset of any of 
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the full graphs over N. It plays a crucial role in the description of faces of C 
corresponding to SE facets of F. 

Corollary 6 SE facets of the family-variable polytope F correspond to those 
facets of the characteristic-imset polytope C that contain the 1-imset. None of 
those facets of C include the 0-imset. 

Proof Let F be a face of C corresponding to an SE facet F of F. We show, using 
Lemma 1, that F is a facet of C. For a face F' of C with F C F' the respective 
SE face F' of F satisfies, by Corollary 5, F C F'. Thus, necessarily F' = F, 
which implies F' = C. By Theorem 1, F contains the whole equivalence class 
of full graphs and, by Corollary 5, F must contain the 1-imset. 

Conversely, let F be an SE face of F which corresponds to a facet F of C 
containing the 1-imset. Using Lemma 1 observe that F is a facet of F. Indeed, 
since F contains the whole equivalence class of full graphs, the same is the 
case for any face F' of F with F C F'. By Lemma 6, F' is SE; hence, it has 
the corresponding face F' of C. By Corollary 5 one has F C F'; therefore, 
F' = C, which implies F' = F. 

The last claim follows easily by contradiction: otherwise the corresponding 
SE facet contains the empty graph and, by Lemma 2(iii), it is determined by 
(6). But none of these facets of F is SE. 

By combining Corollary 6, Theorems 1 and 2 one observes that the facets 
of C containing the 1-imset correspond to extreme supermodular functions. 
On the other hand, it follows from Corollaries 5 and 6 that the SE faces of F 
corresponding to facets of C not containing the 1-imset are sub-maximal SE 
faces with respect to inclusion, but not SE facets. That means, these are SE 
faces F of F such that there is no other SE face F' of F such that F C F' 
except F' = F but F is not a facet of F since dirri(F) < dim(F) — 1. Example 3 
in § 10 shows what such sub-maximal SE faces look like. 

To illustrate Corollary 5 we transform the generalized cluster inequalities 
(14) from § 8 into the characteristic-imset frame. Specifically, having fixed a 
cluster C C N, \C\ >2 and k € {1,..., \C\ — 1}, the coefficients z(S) for S G A 
in the transformed corresponding fc-cluster inequality vanish outside subsets 
of C and only depend on the cardinality of the set S: 



for S G A. (22) 


The proof is based on an auxiliary combinatorial identity (28) from § A. 


Lemma 11 In the context of the characteristic-imset polytope, the fc-cluster 
inequality (14) for C C N, |C| > 2 and fc € (1,..., \C\ — 1}, takes the form 

z(S) ■ c(iS') < \C\ — k , where z(S) are given by (22). (23) 


S&A 
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Proof By suitable substitutions we re-write (23) into the desired form (14): 

E z(S) ■ CtjOS) (2 = (1) Y z ( S )'Y1 E v(a\B) 

SgA SCC: |S|>fc+l aes B ■. S\{a}CBCN\{a} 

= Y E v(a\B)- Y Z ( S ) ■ 

aeC B<ZN\{a} : \BnC\>k S : |S|>fc+l, aeS, S\{a}CBnC 


It remains to show that, for fixed a £ C and B C N \ {a} with \B n C| > fc, 
the indicated expression is indeed 1. We put £ := \B(lC\, s := £ — k and write: 


z ^ = S z({a}UR) 
S : |S|>fc+l, aeS, S\{a}CBnC RCBnC :\R\>k 


( 22 ) 


E (-D 

RCBnC, |ii|>fe 
e-k 


\R\-k (\R\ + 1 — 2 
\R\-k 


r—k 


=E (!) (-» 


\r—k 


r — 1 
r — k 


= E 


m =0 


fc- 


• (-i) r 


m + k — 1 
m 


= E (- 1 )' 


m —0 

which concludes the proof. 


k + s\ fm + k — 1\ ( 28 ) i 
m 


Thus, it follows from Lemma 11 using Corollaries 4 and 6 that (23) defines 
a facet of C containing the 1-imset. 


10 Simple illustrating examples 

To illustrate the achieved results we analyze completely the situation in the 
case of three BN variables and comment on the case of four BN variables. 

We have observed that the following inequalities are facet-defining for the 
family-variable polytope F in case |iV| = n > 3: 

— the non-negativity constraints (6) (see Lemma 2(iii)), 

— the modified convexity constraints (7) (see Lemma 3), and 

— the generalized cluster inequalities (14) (see Corollary 4). 

This is a complete list of facets of F in the case of three BN variables. The 
following example illustrates the observations from § 8; we use a shorthand 
r/(a | be) for r/(a \ {b, c}) below. 

Example 1 If N = {a, b, c} one has |T| = 9. The 9-dimensional polytope F 
has 25 vertices and 17 facets. Five of its facets are SE and are defined by the 
generalized cluster inequalities. They decompose into 3 permutation types: 
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• rj(a | b) + rj(a \ be) + rj(b \ a) + r/(b \ ac ) < 1 (3 inequalities of this type), 
the (generalized) cluster inequality for C = {a, b} (and k = 1), 

the extreme supermodular function is i = £{ a ,6, c } + 

• ry(a | be) + r](b \ ac) + ry(c | ab) <1(1 inequality of this type), 
the generalized cluster inequality for C = {a, b, c} and k = 2, 

it corresponds to the extreme supermodular function m.{ a ,b , c },2 = 3{ a ,b,c}i 

• r](a | b) + r](a | c) + r)(a \ be) + p(b \ a) + 77(6 | c) + 77(6 | ac) 

+ r]{c | a) + ry(c | b) + ??(c | ab) < 2 (1 inequality of this type), 
the (generalized) cluster inequality for C = {a,b,c} (and k = 1), 
the supermodular function is m {aybtC}A = 2-5 {afi ^ +5 {a , b} +<5{ a ,c} + %, c }- 

If one adds nine non-negativity constraints 

• — r)(a | b) < 0 (6 inequalities of this type), 

• —ri(a | be) < 0 (3 inequalities of this type), 

to those five generalized cluster inequalities then one obtains a polytope with 
28 vertices. Besides the 25 vertices of F it has 3 additional integral vertices of 
the type / a <-{f>} + Ia^{c}- By adding the modified convexity constraints 

• rj(a | b) + r](a | c) + r)(a | be) < 1 (3 inequalities of this type), 
one completes the list of facet-defining inequalities for F. 


In the case of four BN variables there are other facet-defining inequalities 
for F than those given by ( 6 ), (7) and (14). In fact, 

— there are other SE facets than those given by clusters in (14), 

— there are facets besides the SE facets and those given by the non-negativity 
constraints ( 6 ) and modified convexity constraints (7). 


Example 2 If IV = {a, b , c, d} one has |T| = 28 and the 28-dimensional polytope 
F has 543 vertices and 135 facets. There exist 37 SE facets of F which decom¬ 
pose into 10 permutation types. In § B we give the list of those types. Six 
of those types are the generalized cluster inequalities (14), but the remaining 
four of them are not. 

The substantial difference from the case of three BN variables is that the 
polyhedron F* specified by 37 SE facet-defining inequalities, 28 non-negativity 
constraints and 4 modified convexity constraints differs from F. We computed 
the vertices of F* and found that, besides all the 543 DAG-codes, it has 786 
additional fractional vertices in comparison with F, which decompose into 37 
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permutation types. Here we give three examples of them: 

Vl = 2 ^a-k-{b} 2 ^ a <—{d} ~F 2 {a,c} "h 2 {a} 

~F ^ ' -^c«— {b,d} “F 7 ) ' ^dl— {a,b,c}; 

^?2 ^ ' la <— {c} ~F {d} ~F ^ <—{b,c,d} “F ^ ' ^b<— {a} 7 ^ ' ^-b <— {a,c,d} 

+ 2 ■ -e—{fa} + 2 ■ ^c<-{d} + 2 ' Ici-{a,b} + g ‘ <-{a,6,c}) 

1 1 1 1 
V3 g la ■*—{b} ~F 2 ' ^ a ^{d} ~F 2 ' {c} “F 2 ' e- {a,c,d} 

+ g ' ^c<-{a} + g ' Ici-{d} + g ' Ic<-{a,b,d} + Ij ' Id<^{b,c}- 


Therefore, the family-variable polytope F necessarily has, besides the above 
mentioned facets, additional non-SE facets. There are 66 such facet-defining 
inequalities which decompose into five permutation types; see [10] for details. 


The next example is devoted to the characteristic-imset polytope C and 
illustrates the observations from §9. In case |iV| = 3, every facet of C either 
contains the 1 -imset or contains the 0 -imset. 

Example 3 If TV = {a, b , c} one has |H| = 4. The 4-dimensional polytopc C has 
11 vertices and 13 facets; they were already discussed in [21, Examples 5,8]. 
There are five facet-defining inequalities tight for the 1-imset; they correspond 
to SE facets of F mentioned in Example 1. Here is their overview in both 
modes; they decompose into 3 permutation types: 

• c (ab) < 1 (3 inequalities of this type), 

in family variables rj(a \ b) + rj(a \ be) + rj(b \ a) + rj(b \ ac ) < 1 , 

• c (abc) <1 (1 inequality of this type), 

in family variables rj(a \ be) + r)(b \ ac) + r)(c | ab) < 1 , 

• c (ab) + c(ac) + c(bc) — c(abc) < 2(1 inequality of this type) 
in family variables 

r](a | b) + r/(a \ c) + r](a \ be) + rj(b \ a) + r](b \ c) + p(b \ ac) 

+ r](c | a) + p(c | b) + r/(c | ab) < 2 . 

The remaining eight facet-defining inequalities of C are tight for the 0-imset 
and decompose into 4 permutation types: 

• —c (ab) < 0 (3 inequalities of this type), 

in family variables —rj(a \ b) — r/(a \ be) — rj(b \ a) — r](b \ ac) < 0 , 

• —c(abc) < 0 (1 inequality of this type), 

in family variables —rj(a \ be) — rj(b \ ac) — rj(c \ ab) < 0 , 

• —c (ab) — c (ac) + c(abc) < 0 (3 inequalities of this type), 

in family variables —rj(a \ b) — rj(a \ c) — r](a \ be) — rj(b \ a) — rj{c | a) < 0 , 
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• —c (ab) — c(ac) — c(be) + 2 • c(abc) < 0 (1 inequality of this type), 

in family variables —r](a \ b) — rj(a \ c) — r](b \ a)—rj(b \ c) — rj(c \ a) — r](c \ b) < 0. 

These eight inequalities define in family variables sub-maximal SE faces of F 
that are not facets: they are implied by the non-negativity constraints. The 77- 
polyhedron F' given by all 13 above-mentioned SE inequalities in unbounded, 
it has a linear subspace of the dimension 5. This polyhedron is, in fact, the 
pre-image of the polytope C by the characteristic transformation (1). 

As concerns the case of four BN variables, unlike the case of three BN 
variables, there are facets of the characteristic-imset polytope C which neither 
contain the 0-imset nor the 1-imset. 

Example 4 In case N = {a,b,c,d} one has |T| = 11 and the 11-dimensional 
polytope C has 185 vertices and 154 facets. Thus, it has 358 fewer vertices than 
the family-variable polytope F, but 19 more facets than F. Besides those 37 
facets that correspond to SE facets of F and contain the 1-imset (Corollary 6), 
there exist 117 facets of C that do not contain the 1-imset. They decompose 
into 20 permutation types, which are listed in §C. 

With the exception of one permutation type all these inequalities are tight 
for the 0-imset. The exception is 

—c (6c) — c(bd) — c(cd) 

+c (abc) + c (abd) + c (acd) + 2 • c (bed) — 2 • c(abcd) < 1, (24) 

in family variables + r](a \ be) + rj(a \ bd) + r)(a \ cd) + r/(a \ bed) 

—r](b | c) — 77(6 | d) — r](c \ b) — r](c \ d) — r](d \ b) — 1 y(d | c) < 1, 

consisting of 4 inequalities. The 77-version of the inequality (24), therefore, 
defines an inclusion submaximal SE face of F which is not a facet. Clearly, 
(24) follows from the modified convexity constraint 

77(0 | b) + rj(a | c) + rj(a \ d) + r](a \ be) + 77(0 | bd) + r](a \ cd) + r](a \ bed) < 1 
and the non-negativity constraints —77(0 | b) < 0, ..., ~rj(d | c) < 0. 


11 The sufficiency of SE faces 

As explained in § 5, the statistical task of learning BN structure can be turned 
into an LP problem to maximize an SE objective over the family-variable 
polytope F. When solving such problems by means of the tools of (integer) 
linear programming an important question is what are the inequalities spec¬ 
ifying the feasible set. The computational complexity depends on how many 
inequalities we actually/potentially use, how complex they are, how closely 
we are able to approximate the true feasible set, which is the family-variable 
polytope F in our case. 

In general, facet-defining inequalities for a polytope P are suitable when 
one maximizes a linear objective over P [24]. Since our goal is to maximize 
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quite special linear objectives over F a natural question is whether we really 
need all facet-defining inequalities for F. Indeed, the correspondence between 
SE faces of F and faces of the characteristic-imset polytope C explained in § 9 
allows one to transform the LP problem to maximize an SE objective over F 
into the task to maximize a general linear function over C. This indicates that 
those facets of F that are not SE are perhaps superfluous. Note that we know 
from Example 2 that there are many non-SE facets of F besides those given 
by the non-negativity ( 6 ) and modified convexity (7) constraints. 

On the other hand, transforming our LP problems completely into the 
frame of characteristic imsets does not appear to be advantageous from the 
point of view of computational complexity as observed in conclusions of [ 22 ]. 
The main theoretical reason is that simple non-negativity and modified con¬ 
vexity constraints are represented in the characteristic-imset frame by much 
higher number of more complex specific inequalities [ 21 ]. 

This motivates the idea of combining both polyhedral approaches to benefit 
from from their different strengths. The constraints that are tight at the empty 
graph are clearly better represented in the family-variable frame while the 
constraints that are tight at (all) the full graphs are more naturally expressed 
in the characteristic-imset frame. Why not stay in the family-variable frame, 
utilize ( 6 ) and (7) there and combine them with SE constraints, which encode 
the constraints on the characteristic-imset polytope? 

In this section we show that this is indeed possible. However, the original 
conjecture we started with, namely that one can limit oneself to the inequalities 
defining SE facets and the non-negativity and modified convexity constraints 
is false; a counter-example in given in § 12 . 

The basic observation is that one can limit to SE faces. 

Lemma 12 Let o be an SE objective. Then the LP problem to maximize 
r] (o, rf)r over 77 £ R r from the polyhedron F' specified by the inequalities 
defining SE faces of F has the same optimal value as the LP problem to 
maximize that function over the family-variable polytope F. 

Proof A basic observation is that the image of F by the transformation (1) 
is C, which can be viewed as the polyhedron specified through its faces. The 
pre-image of C with (1) is, therefore, the polyhedron F' of 77 -vectors specified 
by the respective inequalities in the 77 -mode, which are, by Corollary 5, just 
those defining SE faces of F. Since both F and F' have C as its image by (1), it 
follows from Lemma 10 that the maximization of 771—» ( 0 , 77 ) 7 - = (z 0 tC v )a over 
any of them has the same optimal value as the maximization of c 1 — > (z 0 ,c)a 
over c in the polytope C. 

However, most of the inequalities defining SE faces of F are superfluous. 
That redundant list can be reduced as follows. 

Theorem 3 Let o be an SE objective. Then the LP problem to 


maximize r] 1 —> ( o , 77 ) r over 77 £ F 
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has the same optimal value as the LP problem to maximize the same function 
over the polyhedron specified by 

— the inequalities defining SE faces that correspond to those facets of C that 
do not contain the O-imset, 

— the non-negativity and modified convexity constraints (6) and (7). 


Proof We extend the arguments given in the proof of Lemma 12. The polytope 
C can be viewed as the polyhedron specified by its facet-defining inequalities. 
In particular, the pre-image F' of C by (1) can equivalently be defined as 
the polyhedron specified by the facet-defining inequalities for C which are 
re-written into the 77 -mode. 

Of course, the conclusion of Lemma 12 on the same optimal value holds 
for any polyhedron F" such that F C F" C F'. Thus, in place of F" one can 
take the polyhedron specified by the corresponding facet-defining inequalities 
for C, the non-negativity and modified convexity constraints. 

The last observation is that the facet-defining inequalities for C that are 
tight for the O-imset are implied by the non-negativity constraints. Indeed, the 
77 -versions of such inequalities are tight at the empty graph and the observa¬ 
tion follows from Lemma 2(i)-(ii). Therefore, they can be dropped from the 
specification of F". 

Thus, our aim, when maximizing an SE objective, to eliminate non-SE 
facets of F except for ( 6 ) and (7) seems to be achieved. The price for it is that 
one has to include inequalities that are not facet-defining for F, namely some 
of the facet-defining inequalities for C written in the 77 -mode. 

Remark 3 It follows from the proof of Theorem 3 that the modified convexity 
constraints (7) are superfluous there. However, Theorem 3 can be strengthened 
using a stronger result from [ 21 ] which says that one can exclude the so-called 
specific inequalities from the list of the inequalities given by SE objectives. 
These specific inequalities are shown in [21, §4.1] to be exact translations of 
the constraints ( 6 ) and (7) into the frame of the characteristic imsets. The list 
of specific inequalities in case |iVj = 4 is given in § C; only one of them, namely 
(24), needs (7) for its derivation. Thus, in that strengthening of Theorem 3, 
the modified convexity inequality (7) must be included. 

It turns out that, in the case of |A7| = 4 the original conjecture—that SE 
facets plus ( 6 ) and (7) are sufficient—is true. 

Corollary 7 If |IV| = 4 then the LP problem to maximize 77 >->■ ( 0 , 77 ) 7 - over 
77 £ F with an SE objective o £ R r has the same optimal value as the LP 
problem to maximize the same objective over the polyhedron specified by (6) 
and (7) and the inequalities defining SE facets. 

Proof Use Theorem 3; as shown in Example 4, the only facets of C not implied 
solely by (6) are those in (24), implied by the combination of (6) and (7). 
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12 A counter-example to the original conjecture 

Recently, Orlinskaya, in her thesis [15] disproved Conjecture 1 from [20] by 
finding a new facet-defining inequality for the characteristic-imset polytope C 
in case N = {a, b , c, d , e}, which is neither tight for the 1-imset nor be one of 
the earlier-mentioned specific inequalities , whose 77 -versions are derivable from 
( 6 ) and (7). The inequality has this form: 

— c (ab) + 2 ■ c(ac) + 3 ■ c(oe) + c(M) — c(M) + 2 • c (cd) 

+ 5 • c(ce) + 3 • c (de) + 2 • c (abc) + 4 • c (abd) + 3 • c(abe ) (25) 

+ c(acd) — 2 • c(ace) + 2 • c (bed) — c (bee) — 3 • c (ede) — 5 • c(abcd) 

— 2 • c (abce) — 3 • c(aMe) — c(acde) + c (bede) + 5 • c (abode) < 16 . 

The substitution of (1) gives the family-variable version of the inequality: 

— rj(a | b) + 2 • 77(0 | c) + 3 • r)(a \ e) + 3 • 77(0 | be) + 3 • 77(0 | bd) 

+ 5 • 77(0 | be) + 3 • 77(0 | cd) + 3 • rj(a \ ce) + 3 • rj(a \ de) + 3 • rj(a \ bed) 

+ 5 • 77(0 | bee) + 6 • 77(0 | bde) + 3 • rj(a \ ede) + 6 • rj(a \ bede) 

— rj(b | a) + r/(b \ c) — 77(6 | d) + 2 • 77(6 | ac) + 2 • 77(6 | ad) 

+ 2 • 77(6 | ae) + 2 • 77(6 \ cd) — rj(b \ de) + 2 • 77(6 | acd) + 2 ■ 77(6 | ace) 

+ 2 • 77(6 | ade) + 2 • 77(6 | ede) + 5 • rj(b \ aede) + 2 • 77 ( 0 1 a) 

+ r](c | b) + 2 • r/(c \ d) + 5 • 77 ( 0 1 e) + 5 • 77(0 | ab) + 5 • 77(0 | ad) (26) 
+5 • r)(c | ae) + 5 • r](c \ bd) + 5 • 77 ( 0 1 be) + 4 • 77(0 | de) + 5 • 77(0 | abd) 

+ 5 • 77(0 | abe) + 4 • 77(0 | ade) + 7 • 77(0 | Me) + 7 • 77(0 | aMe) — r/(d \ b) 

+ 2 • 77 (d | c) + 3 • rj(d \ e) + 3 • r/(d | a 6 ) + 3 • rj(d \ ac) + 3 ■ r)(d \ ae) 

+ 3 • r/(d | be) + 2 • rj(d \ be) + 2 • rj(d \ ce) + 3 • ri(d \ abc) + 3 ■ r](d \ abe) 

+ 2 • r](d | ace) + 4 • r](d \ bee) + 5 • r](d \ abce) + 3 • rj(e \ a) 

+ 5 • r](e | c) + 3 ■ r](e \ d) + 6 • p{e \ ab) + 6 • r)(e \ ac) + 6 • r]{e \ ad) 

+ 4 ■ r](e | be) + 3 • rj(e \ bd) + 5 • r](e \ cd) + 6 • rj(e \ abc) 

+ 6 • r)(e | abd) + 5 • r](e \ acd) + 5 • r](e \ bed) + 8 • r](e \ abed) < 16 . 

Consider the corresponding SE objective o* S M r , that is, for any (a | B) £ T, 
o*(a\B) is the coefficient with rj(a\B) in (26). It follows immediately from 
Lemma 2(iv) that (26) is not facet-defining for F because some coefficients are 
negative. In fact,the respective SE face of F given by (o*,r])r = 16, denoted 
below by T*, has the dimension 53, which is far from 74, the dimension of facets 
of F. We checked this fact by means of a computer: we found all 153 codes of 
acyclic directed graphs on E*; at most 54 of them are affinely independent. 

On the other hand, the inequality (25) is facet-defining for C. We have 
computed 59 characteristic imsets on this face of C, denoted below by F*, and 
found 26 of them affinely independent. This implies the dimension of E* is 25, 
which is the dimension of facets of C. 
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To get the desired counter-example we consider a convex combination 77 -)- 
of all 153 codes of acyclic directed graphs on F* with the coefficients yiy: 


Vi '' 153 ' /q 


4 

153 

10 


' 153 ’ Ia 
+ 153 ' Ia 
+ 153 ' Ia 

6 r 

+ 153' /b 

+ — -I c 
153 c 

15 r 

H- I c 

153 c 

2 

+ 153' Jc 
153 c 

10 T 

+ 153 ’ Id 

+ — •4 
153 d 

13 T 

+ 153 ’ Id 

19 r 

H- L 

153 e 

+ lfe' Je 
+ ll' Je 


4 r _14 J_ 

-{0} + 253 ’ ^ a< -f d l 253 + 253 ’ -'“•MM} 

_3_ _8_ J_ 

*MM} + 253 ’ -'a MM} + 253 ‘ -‘“MM} + ygg ' fa<-{c,e} 

1 3 24 

MM} + -^gg * {MM “b 253 * {Mi e } ""b 253 * f^ ,c ^ ,e } 

18 r _8_ _6_ 
f—{c,d,e} + ggg ’ F<-{M.M} + ggg ‘ h <-{c} + ygg ' h^{e} 

_6_ _66_ 4 r 

MM} + ^gg • h<^{c,d,e} + ygg • h<^{a,c,d,e} + ygg ' Ic<-{“} 

2 33 8 

M 6 } + 253 ' ''“MM + ggg ■ -*c<-{e} + ^gg • l c <-{a,b} 

13 11 1 

MM} + igg ' 7 c< _{a ie } + • i c <-{6,d} + ygg ‘ ^ci-{b,e} 

1 21 

— {a,b,d} “b ggg ' dc4—{a,b,e} T ggg * I c <— {b,d,e} 

4 2 38 

-{“.MM + ygg ' M ■(-{“} + ggg ' *d<-{c} + ggg • fd-s-te} 

12 r 13 J_ 

MM} + ggg ' fd<-{a,c} + ggg • *d <—{a,e} + ggg ' fd<-{M} 

— T — T 

c— {a,b,c} “b 2^0 ' ^d<r- {a,b,e} T -^gg * {b,c,e} 

8 r 3 r 23 r 

M“>MM + ygg ' ^e<-{a} + ygg • 7 e <-{b} + igg • -<e •(-{“} 

15 r 12 r 1T_ 

M rf } + 253 ' -*eMM} + 253 ' ^ e M°> c } ^ 253 ' ^ e MM} 

2 1 4 

MM} + ggg • “e <—{M} + ygg • f e <-{a,b,c} + ygg • ^e<-{a,b,d} 

14 


It is tedious but straightforward to verify (o*,rj^) = 16. One can also easily 
check that none of five modified convexity constraints is tight for r We also 
verified that the vector C| G ascribed to 77 f by ( 1 ) is in the relative interior 
of F* C C. For this purpose, we have first used 59 vertices of F* to compute 
its 55 facets. Then we verified computationally that c-f does not belong to any 
of the 55 facets of F*. 

The above observation implies that none of the SE-facets of F contains 772 . 
Indeed, assume for a contradiction that 772 belongs to some SE-facet F of F. 
Then, by Lemma 10 and Corollary 5, C 2 belongs to the corresponding face F 
of C, which is, by Corollary 6 , a facet of C. The facet F does not contain fully 
F* since otherwise, by Lemma 1 applied to C and F*, one has F = F* and, 
by Corollary 5, F = F*, contradicting the above mentioned fact that (26) is 
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not facet-defining for F. Therefore, cj £ F* C\F C F*, which is a contradiction 
with belonging cj to the relative interior of F *. 

These observations are enough to derive the existence of a counter-example, 
which is the vector 77 * := (1 + e) • rjf, where e > 0 is small enough. Indeed, 
? 7 * satisfies all non-negativity constraints and all other inequalities, namely 5 
modified convexity constraints and SE-facets of F are valid for r]j but not tight 
for it: ( 0 , 1 ft) < u for the respective o £ R r and u > 0. Since the number of 
these inequalities is finite, a small e-perturbation retains (o, 77 *) < u for any of 
them. On the other hand, the value of the considered SE objective o* £ R r 
for 77 * is 

(o*,V*) = (! + £)• (o*,m) = (1 + e) • 16 > 16. 

Thus, the maximum of the linear SE objective 77 1 —> (o*,r]}, 77 £ R r on F 
is 16, while its value in 77 *, which satisfies ( 6 ), (7) and all SE facet-defining 
inequalities for F, exceeds 16. This gives the desired counter-example. 


13 Conclusions 

Let us summarize the main achievements of the paper. We dealt with two 
distinguished polytopes used in the ILP approach to BN structure learning, 
namely with the family-variable polytope and the characteristic-imset polytope. 
Being motivated by a common form of linear objectives to be maximized in 
the BN structure learning we introduced the concept of a score equivalent (SE) 
face of the family-variable polytope. We further characterized the linear space 
of the corresponding SE objectives (Lemma 5). 

A correspondence has been established between SE faces of the family- 
variable polytope F and the faces of the characteristic-imset polytope C, which 
preserves the inclusion of faces (Corollary 5). We observed that SE facets 
of F correspond to those facets of C which contain a distinguished vector, 
called the 1-imset (Corollary 6). These facets were shown to correspond to 
extreme supermodular functions , which gives an elegant method to verify that 
an inequality is SE-facet-defining for F (combined Theorems 1 and 2). To 
illustrate the method we showed that the well-known (generalized) cluster 
inequalities are facet-defining for F (Corollary 4) and derived their form in the 
context of the characteristic-imset polytope (Lemma 11). The correspondence 
with extreme supermodular set functions (Theorem 2) may appear to be useful 
because of a recent extremality criterion for supermodular functions from [23]. 

Since a typical linear objective appearing in the ILP approach to learning 
BN structure is special, namely SE, we raised the question whether all facets of 
F are needed to specify the feasible sets for (integer) linear programs when such 
an objective is maximized. We succeeded in showing that one can eliminate 
those facets of F that are not SE, that is, defined by a non-SE normal vectors 
(Lemma 12, Theorem 3). Nevertheless, our starting original conjecture that 
one can, besides simple non-negativity and modified convexity constraints, 
limit oneself only to SE facets of F turned out not to be true (a counter¬ 
example is given in § 12). The moral is that one has to consider the inequalities 
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defining facets of the characteristic-imset polytope C despite the fact that they 
do not define facets in the context of the family-variable polytope F. 

This leads to a suggestion to use a combined coding of BN structures 
in the ILP approach. One can encode a BN structure by a concatenation of 
the family-variable vector and the characteristic imset and utilize the linear 
relation (1). Linear constraints tight at the empty graph are better represented 
by simple non-negativity and modified convexity inequalities in the family- 
variable part, while the other SE linear inequality constraints can be more 
naturally represented in the characteristic-imset part. 

We left some of the questions open. One of them is whether a simple 
condition of being closed under Markov equivalence characterizes the sets of 
graph-codes belonging to SE faces of F (Conjecture 1). However, it looks like 
the answer to this question is not essential for the practical application of ILP 
methods in BN structure learning. 
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A Combinatorial identity 

Lemma 13 For every non-negative integers s>0,k>K>0 one has 


£(-!>' 


m =0 


k + s\ (m + k — K\ f s + K — 1 


k + m 


m 


K - 1 


(27) 


with conventions (") = (”) = 1 for any n € Z and = ( "J = 0 for any 
non-negative n £ Z. In particular, 


Vs > 0, k > 1 integers £ (-i)' 


m —0 


k + s 
k + m 


m + k — 1 

TO 


= 1 - ( 28 ) 


Proof The proof relies on the Pascal’s triangle identity 


n — 1\ (n — 1 
r — 1 


valid for integers n>l,n>r>0. 


Let us denote the sum in (27) by S(s, k, K)\ the basic idea of the proof is the 
induction on s + K. First, we verify (27) in the case s = 0: 





fO + K- 1\ 

l i'- J ' 


)°.(‘;“' 
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Further special easy case is s > 1 and A = 0, in which case 


E(s > l,fc, A = 0) = ^ (-l) r 


m—O 


fc + s 
k + m 


m + k 
m 


= E (-i)' 


(fc + s)! (k + m)\ 


m—O 


(k + m)\ ■ (s — to)! to! • fc! 


(fc + s)! 
fc! • s! 


■ E (- 1 )' 


s\ 


m—O 


to! • (s — to)! 


fc + s 


E(-+ 


m=0 


fc 


(-l + l) s = o = 


s — 1 

-1 


Thus, (27) holds in cases s = 0 and A = 0; in particular, if s + K < 1. In 
case s,K > 1 the induction premise means (27) holds for s',K' > 0 with 
s' + K' < s + K — 1. To verify the induction step write by the identity 


fc + s 

fc + TO 


fc + s — l\ /fc + s — 1 


fc + TO 


fc + TO — 1 


use 


fc + s — 1 
fc + s 


= 0 , 


apply the induction premise and use the Pascal’s triangle identity again: 


E(s,k,K)=J2 (- 1 )" 

m=0 
s — 1 

= E <-+ 


m—O 


E(-d 


m—O 


k + s\ fm + k — K 

fc + TO 


fc + s — 1\ / to + fc — K 

fc + TO 


fc + s — 1\ /to + fc — K 
k + m — 1 


TO 


= E(s - 1, fc, K) + E(s, fc - 1, K - 1) 


s + K - 2\ /s + fcf - 2 

+ 


A' - 1 


A — 2 


s +A- 1 
A- 1 


which gives the desired result. Putting A = 1 gives (28). 


B SE facets in case of four BN variables 

There exist 37 SE facets of F in the case N = {a, 6, c, d} which decompose 
into 10 permutations types. Below we list all the types of the inequalities, 
both in the family-variable mode and in the characteristic-imset mode. The 
generalized cluster inequalities are indicated by •, the remaining types by o; 
those are also labeled by the notation used in the catalogue from [10]. 
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• the (generalized) cluster inequality for C = {a, b} (and k = 1), 

[ rj(a | b) + rj(a \ be) + rj(a \ bd) + ij(a \ bed) ] + [ rj(b \ a) + 77(6 | ac) + 77(6 | ad) + 77(6 | acd) ] < 1 , 

(6 inequalities of this type), in characteristic imsets 

c(ab) < 1, 

• the generalized cluster inequality for C = {a, b , c} and fc = 2, 

[ 77 ( 0 , | 5c) -p 77 ( 0 , | bed) ] + [ 77(6 | ac) + 77(6 | acd) ] + [ 77(0 | a£>) + ij(c \ abd) ] < 1 , 

(4 inequalities of this type), in characteristic imsets 

c(abc) < 1, 

• the (generalized) cluster inequality for C = {a,b,c} (and k = 1), 

[ r/(a | ft) + r](a \ c ) + f?(a \ be) + r/(a \ bd) + r](a \ cd) + r](a \ bed) ] 

+ [r)(b | a) + r/(b | c) + 7j(b \ ac) + 7j(b \ ad) + r/(b \ cd) + r/(b | acd) ] 

+ [??(c| a) + 77(0 | b) + ??(c | ab) + i](c \ ad) + r)(c \ bd) + r/(c \ abd) ] < 2 , 

(4 inequalities of this type), in characteristic imsets 

c(ab) + c (ac) + c(bc) — c(abc) < 2 , 

• the generalized cluster inequality for C = {a, b, c, d} and k = 3, 

[ r](a | bed) + 77(6 | acd) + r/(c \ abd) + r/(d | abc) ] < 1 , 

(1 inequality of this type), in characteristic imsets 

c (abed) < 1, 

• the generalized cluster inequality for C = {a, b, c, d} and k = 2, 

[ r/(a | be) + r/(a \ bd) + r](a \ cd) + r](a \ bed) ] 

+ [ r)(b | ac) + 7j(b \ ad) + r](b \ cd) + r/(b \ acd) ] 

+ [ r)(c | ab) + r;(c | ad) + 77 ( 01 bd) + 77(0 | abd) ] 

+ [r/(d | ab) + r/(d \ ac) + r](d \ be) + r/(d \ abc) ] < 2 , 

(1 inequality of this type), in characteristic imsets 

c(abc) + c (abd) + c(acd) + c(bcd) — 2 • c (abed) < 2 , 

• the (generalized) cluster inequality for C = {a,b,c,d} (and k = 1), 

[ 7j(a | b) + r;(a | c) + 7j(a | d) + r)(a | be) + r)(a | bd) + rj(a | cd) + rj(a \ bed) ] 

+ [r?(b | a) + rj(b \ c) + rj(b | d) + 77(6 | ac) + r/(b | ad) + 77(6 | cd) + 77(6 | acd) ] 

+[ r)(c | a) + r/(c | 6 ) + r;(c | d) + r/(c | aft) + r/(c | ad) + q(c \ bd) + r)(c \ abd) ] 

+ [ri(d | a) + ?j(d | ft) + r/(d \ c) + r/(d \ ab) + r/(d \ ac) + r](d \ be) + r/(d \ abc) ] < 3 , 

(1 inequality of this type), in characteristic imsets 

c (ab) + c (ac) + c(ad) + c (be) + c(bd) + c(cd) 

— c(abc) — c (abd) — c (acd) — c (bed) + c (abed) < 3 , 
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o non-cluster SE inequality with 13 terms 
[a-a,b-2-acd,c-2-abd,d-2-abc FACETS a\bcd] 

[ r)(a | be) + r/(a \ bd) + rj(a \ cd) + 2 ■ 77(0 | bed) ] 

+ [ 77(6 | ac) + ri(b \ ad) + r/(b \ acd)] 

+ [??(c | ab) + rife \ ad) + r/(c | abd) ] 

+ [ r/(d | ab) + r)(d \ ac) + r)(d | abc) ] < 2 , 

(4 inequalities of this type), in characteristic imsets 

c(afrc) + c(abd) + c(acd) — c(abcd ) < 2 , 

o non-cluster SE inequality with 16 terms 
[a-2-bcd,b-2-acd,c-ab,d-ab FACETS ab\cd] 

[r/(a | b) + 77 (a \ be) + r](a \ bd) + 77 (a | cd) + r/(a | bed) ] 

+[v{b | a) + v(b I “c) + rj(b | ad) + rj(b | cd) + i](b \ acd) ] 

+[ri{c | ad) + r;(c | bd) + r/(c \ abd) ] 

+[? 7 (d | ac) + »;(d \ be) + r](d \ abc) ] < 2 , 

(6 inequalities of this type), in characteristic imsets 

c(ab) + c(acd) + c (bed) — c (abed) < 2 , 

o non-cluster SE inequality with 22 terms 

[a-a,a-2-bcd,b-ac,b-ad,c-ab,c-ad,d-ab,d-ac FACETS a\bcd] 

[ri(a | b) + r}(a \ c) + r](a \ d) + 2 - r/(a \ be) + 2 ■ r/(a \ bd) + 2 ■ r)(a \ cd) + 2 ■ r/(a \ bed) ] 
+[r](b | a) + r/(b | ac) + r/(b \ ad) + r){b \ cd) + r/(b \ acd) ] 

+ [r/{c | a) + rj(c | ab) + r/(c \ ad) + ry(c | bd) + r/(c \ abd) ] 

+ [r](d | a) + r](d \ ab) + r/(d \ ac) + r](d \ be) + »7((11 abc) ] < 3, 

(4 inequalities of this type), in characteristic imsets 

c (ab) + c (ac) + c(ad ) + c (bed) — z{abcd) < 3 , 

o non-cluster SE inequality with 26 terms 

[a-a,a-bcd,b-b,b-acd,c-c,c-ad,c-bd,d-d,d-ac,d-bc FACETS ab\cd] 

[ ?j(a | b) + ri(a \ c) + v( a I d) + r/(a \ be) + r/(a \ bd) + 2 ■ r?(a | cd) + 2 ■ r](a \ bed) ] 

+ [r/(b | a) + r](b | c) + r/(b \ d) + r/(b | ac) + r](b \ ad) + 2 ■ 77 (f) | cd) + 2 - 77 (f) | acd) ] 

+ [ 77(0 | a) + 77(0 | 6) + r/(c | aft) + 77 (c | ad) -)- 77(0 | 6d) + 2 - 77 (c | aid) ] 

+ [ ? 7 (d | a) + ? 7 (d | ft) + 7?(d | aft) + r](d | ac) + r/(d \ be) + 2 ■ r)(d \ abc) ] < 4, 

(6 inequalities of this type), in characteristic imsets 


c (ab) + c(ac) + c (ad) + c (be) + c (bd) — c(abc) — c(abd) + c(abcd) < 4 . 
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C Specific inequalities in case of four BN variables 

In the case \N\ = 4, the characteristic-imset polytope C has, besides 37 facets 
containing the 1-imset and listed in §B, additional 117 specific facets that do 
not contain the 1-imset. They all are defined by means of the so-called specific 
inequalities discussed in [21, §4.1.2]. Each of these inequalities corresponds to 
a clutter (= Sperner family) of non-empty subsets of N, that is, to a class of 
inclusion-incomparable subsets of N. The 117 specific facets decompose into 
20 permutation types listed below. Except 4 facets belonging to the last type, 
mentioned earlier in (24), all of them contain the 0-imset. 

o —c (ab) < 0 (6 inequalities of this type), 

Sperner family is I = {ab}, 
o —c (abc) < 0 (4 inequalities of this type), 

Sperner family is X = {abc}, 
o —c(abcd ) < 0 (1 inequality of this type), 

Sperner family is I = {abed}, 

o —c (ab) — c(ac) — c(bc) + 2 • c (abc) < 0 (4 inequalities of this type), 
Sperner family is I = {ab, ac, be}, 

o —c (ab) — c(acd) — c(bcd) + 2 • c (abed) < 0 (6 inequalities of this type), 
Sperner family is X = {ab, acd, bed}, 

o —c (abc) — c (abd) — c(acd ) + 2 • c (abed) < 0 (4 inequalities of this type), 
Sperner family is X = {abc, abd, acd}, 
o —c (abc) — c (abd) — c(acd) — c(bcd) + 3 • c (abed) < 0 (1 inequality), 
Sperner family is I = {abc, abd, acd, bed}, 
o —c(ab)—c(ac)—c(ad)—c(bcd)+c(abc)+c(abd)+c(acd) <0 (4 inequalities), 
Sperner family is X = {ab, ac, ad, bed}, 
o —c (ab) — c (ac) — c (ad) — c (be) — c (bd) 

+2 • c (abc) + 2 • c(abd) + c(acd) + c (bed) — 2 ■ c (abed) < 0 (6 inequalities), 
Sperner family is I = {ab, ac, ad, be, bd}, 
o —c {ab) — c(ac) — c(ad) — c(bc) — c (bd) — c (cd) 

+2-c(abc) + 2-c(abd) + 2-c(acd) + 2-c(bcd) — 3-c(abcd) < 0 (1 inequality), 
Sperner family is X = {ab, ac, ad, be, bd, cd}, 
o —c (ab) — c(ac) + c(abc) < 0 (12 inequalities of this type), 

Sperner family is X = {ab, ac}, 

o —c (abc) — c(abd) + c{abcd) < 0 (6 inequalities of this type), 

Sperner family is I = {abc, abd}, 
o —c {ab) — c (acd) + c(abcd) < 0 (12 inequalities of this type), 

Sperner family is I = {ab, acd}, 
o —c (ab) — c (cd) + c (abed) < 0 (3 inequalities of this type), 

Sperner family is X = {ab, cd}, 
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o —c(ab)—c(ac)—c(ad)+c(abc)+c(abd)+c(acd)—c(abcd) < 0 (4 inequalities), 
Sperner family is I = { ab, ac, ad}, 

o —c (ab) — c(ac) — c (bd) + c(abc) + c (abd) < 0 (12 inequalities of this type), 
Sperner family is I = {ab, ac, bd}, 

o — c(ab) — c(ac) — c(bcd)+c(abc)+c(abcd) < 0 (12 inequalities of this type), 
Sperner family is I = {ab, ac, bed}, 

o —c (ab) — c(ac) — c ( 6 c) — c (cd) + 2 • c (abc) + c(acd) + c (bed) — c(abcd) < 0 
(12 inequalities of this type), 

Sperner family is X = {ab, ac, be, cd}, 

o —c (ab) — c (ad) — c (be) — c (cd) 

+c (abc) + c(abd) + c(acd) + c {bed) — c(abcd) < 0 (3 inequalities), 
Sperner family is I = {ab, ad, be, cd}, 

o —c ( 6 c) — c(bd) — c (cd) + c {abc) + c(abd) + c(acd) + 2 • c {bed) — 2 • c (abed) < 1 
(4 inequalities of this type), 

Sperner family is I = {a, be, bd, cd}. 





